A computational auditory scene analysis system for speech segregation and robust speech recognition
نویسندگان
چکیده
A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance. Index Terms – speech segregation, computational auditory scene analysis, binary time-frequency mask, robust speech recognition, uncertainty decoding
منابع مشابه
مشکلات جداسازی اصوات گفتاری همزمان در کودکان کم شنوا
Objective: This study was a basic investigation of the ability of concurrent speech segregation in hearing impaired children. Concurrent segregation is one of the fundamental components of auditory scene analysis and plays an important role in speech perception. In the present study, we compared auditory late responses or ALRs between hearing impaired and normal children. Materials & Methods...
متن کاملMonaural segregation of voiced speech using discriminative random fields
Techniques for separating speech from background noise and other sources of interference have important applications for robust speech recognition and speech enhancement. Many traditional computational auditory scene analysis (CASA) based approaches decompose the input mixture into a time-frequency (T-F) representation, and attempt to identify the T-F units where the target energy dominates tha...
متن کاملChallenge Problem for Computational Auditory Scene Analysis: Understanding Three Simultaneous Speeches
Understanding three simultaneous speeches is proposed as a challenge problem to foster arti cial intelligence, speech and sound understanding or recognition, and computational auditory scene analysis research. Automatic speech recognition under noisy environments is attacked by speech enhancement techniques such as noise reduction and speaker adaptation. However, the signal-to-noise ratio of sp...
متن کاملAn Auditory Scene Analysis Approach to Monaural Speech Segregation
A human listener has the remarkable ability to segregate an acoustic mixture and attend to a target sound. This perceptual process is called auditory scene analysis (ASA). Moreover, the listener can accomplish much of auditory scene analysis with only one ear. Research in ASA has inspired many studies in computational auditory scene analysis (CASA) for sound segregation. In this chapter we intr...
متن کاملMartin Cooke , Phil Green and Malcolm Crawford HANDLING MISSING DATA IN SPEECH RECOGNITION
In this paper, we propose a new paradigm for robust ASR based on auditory scene analysis. In previous work, we have shown how models of auditory processing and grouping principles can be used to separate the evidence for a speech signal from arbitrary intrusions. However, this evidence will generally be incomplete since some spectrotemporal regions will be dominated by the other sources. Here, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 24 شماره
صفحات -
تاریخ انتشار 2010